{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#hide\n",
    "#skip\n",
    "! [ -e /content ] && pip install -Uqq fastai  # upgrade fastai on colab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#default_exp data.external"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "from fastai.torch_basics import *\n",
    "from fastdownload import FastDownload\n",
    "from functools import lru_cache\n",
    "import fastai.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# External data\n",
    "> Helper functions to download the fastai datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To download any of the datasets or pretrained weights, simply run `untar_data` by passing any dataset name mentioned above like so: \n",
    "\n",
    "```python \n",
    "path = untar_data(URLs.PETS)\n",
    "path.ls()\n",
    "\n",
    ">> (#7393) [Path('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/keeshond_34.jpg'),...]\n",
    "```\n",
    "\n",
    "To download model pretrained weights: \n",
    "```python \n",
    "path = untar_data(URLs.PETS)\n",
    "path.ls()\n",
    "\n",
    ">> (#2) [Path('/home/ubuntu/.fastai/data/wt103-bwd/itos_wt103.pkl'),Path('/home/ubuntu/.fastai/data/wt103-bwd/lstm_bwd.pth')]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " A complete list of datasets that are available by default inside the library are: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Main datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.    **ADULT_SAMPLE**: A small of the [adults dataset](https://archive.ics.uci.edu/ml/datasets/Adult) to  predict whether income exceeds $50K/yr based on census data. \n",
    "-    **BIWI_SAMPLE**: A [BIWI kinect headpose database](https://www.kaggle.com/kmader/biwi-kinect-head-pose-database). The dataset contains over 15K images of 20 people (6 females and 14 males - 4 people were recorded twice). For each frame, a depth image, the corresponding rgb image (both 640x480 pixels), and the annotation is provided. The head pose range covers about +-75 degrees yaw and +-60 degrees pitch. \n",
    "1.    **CIFAR**: The famous [cifar-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset which consists of 60000 32x32 colour images in 10 classes, with 6000 images per class.      \n",
    "1.    **COCO_SAMPLE**: A sample of the [coco dataset](http://cocodataset.org/#home) for object detection. \n",
    "1.    **COCO_TINY**: A tiny version of the [coco dataset](http://cocodataset.org/#home) for object detection.\n",
    "-    **HUMAN_NUMBERS**: A synthetic dataset consisting of human number counts in text such as one, two, three, four.. Useful for experimenting with Language Models.\n",
    "-    **IMDB**: The full [IMDB sentiment analysis dataset](https://ai.stanford.edu/~amaas/data/sentiment/).          \n",
    "\n",
    "-    **IMDB_SAMPLE**: A sample of the full [IMDB sentiment analysis dataset](https://ai.stanford.edu/~amaas/data/sentiment/). \n",
    "-    **ML_SAMPLE**: A movielens sample dataset for recommendation engines to recommend movies to users.            \n",
    "-    **ML_100k**: The movielens 100k dataset for recommendation engines to recommend movies to users.             \n",
    "-    **MNIST_SAMPLE**: A sample of the famous [MNIST dataset](http://yann.lecun.com/exdb/mnist/) consisting of handwritten digits.        \n",
    "-    **MNIST_TINY**: A tiny version of the famous [MNIST dataset](http://yann.lecun.com/exdb/mnist/) consisting of handwritten digits.                   \n",
    "-    **MNIST_VAR_SIZE_TINY**:  \n",
    "-    **PLANET_SAMPLE**: A sample of the planets dataset from the Kaggle competition [Planet: Understanding the Amazon from Space](https://www.kaggle.com/c/planet-understanding-the-amazon-from-space).\n",
    "-    **PLANET_TINY**: A tiny version  of the planets dataset from the Kaggle competition [Planet: Understanding the Amazon from Space](https://www.kaggle.com/c/planet-understanding-the-amazon-from-space) for faster experimentation and prototyping.\n",
    "-    **IMAGENETTE**: A smaller version of the [imagenet dataset](http://www.image-net.org/) pronounced just like 'Imagenet', except with a corny inauthentic French accent. \n",
    "-    **IMAGENETTE_160**: The 160px version of the Imagenette dataset.      \n",
    "-    **IMAGENETTE_320**: The 320px version of the Imagenette dataset. \n",
    "-    **IMAGEWOOF**: Imagewoof is a subset of 10 classes from Imagenet that aren't so easy to classify, since they're all dog breeds.\n",
    "-    **IMAGEWOOF_160**: 160px version of the ImageWoof dataset.        \n",
    "-    **IMAGEWOOF_320**: 320px version of the ImageWoof dataset.\n",
    "-    **IMAGEWANG**: Imagewang contains Imagenette and Imagewoof combined, but with some twists that make it into a tricky semi-supervised unbalanced classification problem\n",
    "-    **IMAGEWANG_160**: 160px version of Imagewang.        \n",
    "-    **IMAGEWANG_320**: 320px version of Imagewang. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Kaggle competition datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. **DOGS**: Image dataset consisting of dogs and cats images from [Dogs vs Cats kaggle competition](https://www.kaggle.com/c/dogs-vs-cats). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Image Classification datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.    **CALTECH_101**: Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato.\n",
    "1.    CARS: The [Cars dataset](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) contains 16,185 images of 196 classes of cars.   \n",
    "1.    **CIFAR_100**: The CIFAR-100 dataset consists of 60000 32x32 colour images in 100 classes, with 600 images per class.   \n",
    "1.    **CUB_200_2011**: Caltech-UCSD Birds-200-2011 (CUB-200-2011) is an extended version of the CUB-200 dataset, with roughly double the number of images per class and new part location annotations\n",
    "1.    **FLOWERS**: 17 category [flower dataset](http://www.robots.ox.ac.uk/~vgg/data/flowers/) by gathering images from various websites.\n",
    "1.    **FOOD**:         \n",
    "1.    **MNIST**: [MNIST dataset](http://yann.lecun.com/exdb/mnist/) consisting of handwritten digits.      \n",
    "1.    **PETS**: A 37 category [pet dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) with roughly 200 images for each class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NLP datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.    **AG_NEWS**: The AG News corpus consists of news articles from the AG’s corpus of news articles on the web pertaining to the 4 largest classes. The dataset contains 30,000 training and 1,900 testing examples for each class.\n",
    "1.    **AMAZON_REVIEWS**: This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.\n",
    "1.    **AMAZON_REVIEWS_POLARITY**: Amazon reviews dataset for sentiment analysis.\n",
    "1.    **DBPEDIA**: The DBpedia ontology dataset contains 560,000 training samples and 70,000 testing samples for each of 14 nonoverlapping classes from DBpedia. \n",
    "1.    **MT_ENG_FRA**: Machine translation dataset from English to French.\n",
    "1.    **SOGOU_NEWS**: [The Sogou-SRR](http://www.thuir.cn/data-srr/) (Search Result Relevance) dataset was constructed to support researches on search engine relevance estimation and ranking tasks.\n",
    "1.    **WIKITEXT**: The [WikiText language modeling dataset](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/) is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.  \n",
    "1.    **WIKITEXT_TINY**: A tiny version of the WIKITEXT dataset.\n",
    "1.    **YAHOO_ANSWERS**: YAHOO's question answers dataset.\n",
    "1.    **YELP_REVIEWS**: The [Yelp dataset](https://www.yelp.com/dataset) is a subset of YELP businesses, reviews, and user data for use in personal, educational, and academic purposes\n",
    "1.    **YELP_REVIEWS_POLARITY**: For sentiment classification on YELP reviews."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Image localization datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.    **BIWI_HEAD_POSE**: A [BIWI kinect headpose database](https://www.kaggle.com/kmader/biwi-kinect-head-pose-database). The dataset contains over 15K images of 20 people (6 females and 14 males - 4 people were recorded twice). For each frame, a depth image, the corresponding rgb image (both 640x480 pixels), and the annotation is provided. The head pose range covers about +-75 degrees yaw and +-60 degrees pitch. \n",
    "1.    **CAMVID**: Consists of driving labelled dataset for segmentation type models.\n",
    "1.    **CAMVID_TINY**: A tiny camvid dataset for segmentation type models.\n",
    "1.    **LSUN_BEDROOMS**: [Large-scale Image Dataset](https://arxiv.org/abs/1506.03365) using Deep Learning with Humans in the Loop\n",
    "1.    **PASCAL_2007**: [Pascal 2007 dataset](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/) to recognize objects from a number of visual object classes in realistic scenes.\n",
    "1.    **PASCAL_2012**: [Pascal 2012 dataset](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) to recognize objects from a number of visual object classes in realistic scenes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Audio classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. **MACAQUES**: [7285 macaque coo calls](https://datadryad.org/stash/dataset/doi:10.5061/dryad.7f4p9) across 8 individuals from [Distributed acoustic cues for caller identity in macaque vocalization](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4806230).\n",
    "2. **ZEBRA_FINCH**: [3405 zebra finch calls](https://ndownloader.figshare.com/articles/11905533/versions/1) classified [across 11 call types](https://link.springer.com/article/10.1007/s10071-015-0933-6). Additional labels include name of individual making the vocalization and its age."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Medical imaging datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. **SIIM_SMALL**: A smaller version of the [SIIM dataset](https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/overview) where the objective is to classify pneumothorax from a set of chest radiographic images.\n",
    "2. **TCGA_SMALL**: A smaller version of the [TCGA-OV dataset](http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ) with subcutaneous and visceral fat segmentations. Citations:\n",
    "\n",
    "    Holback, C., Jarosz, R., Prior, F., Mutch, D. G., Bhosale, P., Garcia, K., … Erickson, B. J. (2016). Radiology Data from The Cancer Genome Atlas Ovarian Cancer [TCGA-OV] collection. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ\n",
    "\n",
    "    Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. https://link.springer.com/article/10.1007/s10278-013-9622-7"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pretrained models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.    **OPENAI_TRANSFORMER**: The GPT2 Transformer pretrained weights.\n",
    "1.    **WT103_FWD**: The WikiText-103 forward language model weights.\n",
    "1.    **WT103_BWD**: The WikiText-103 backward language model weights."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# export\n",
    "@lru_cache(maxsize=None)\n",
    "def fastai_cfg():\n",
    "    \"`Config` object for fastai's `config.ini`\"\n",
    "    return Config(Path(os.getenv('FASTAI_HOME', '~/.fastai')), 'config.ini', create=dict(\n",
    "        data = 'data', archive = 'archive', storage = 'tmp', model = 'models'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a basic `Config` file that consists of `data`, `model`, `storage` and `archive`. \n",
    "All future downloads occur at the paths defined in the config file based on the type of download. For example, all future fastai datasets are downloaded to the `data` while all pretrained model weights are download to `model` unless the default download location is updated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('data', Path('/home/jhoward/.fastai/data'))"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cfg = fastai_cfg()\n",
    "cfg.data,cfg.path('data')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# export\n",
    "def fastai_path(folder):\n",
    "    \"Path to `folder` in `fastai_cfg`\"\n",
    "    return fastai_cfg().path(folder)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Path('/home/jhoward/.fastai/archive')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fastai_path('archive')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## URLs -"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "class URLs():\n",
    "    \"Global constants for dataset and model URLs.\"\n",
    "    LOCAL_PATH = Path.cwd()\n",
    "    MDL = 'http://files.fast.ai/models/'\n",
    "    GOOGLE = 'https://storage.googleapis.com/'\n",
    "    S3  = 'https://s3.amazonaws.com/fast-ai-'\n",
    "    URL = f'{S3}sample/'\n",
    "\n",
    "    S3_IMAGE    = f'{S3}imageclas/'\n",
    "    S3_IMAGELOC = f'{S3}imagelocal/'\n",
    "    S3_AUDI     = f'{S3}audio/'\n",
    "    S3_NLP      = f'{S3}nlp/'\n",
    "    S3_COCO     = f'{S3}coco/'\n",
    "    S3_MODEL    = f'{S3}modelzoo/'\n",
    "\n",
    "    # main datasets\n",
    "    ADULT_SAMPLE        = f'{URL}adult_sample.tgz'\n",
    "    BIWI_SAMPLE         = f'{URL}biwi_sample.tgz'\n",
    "    CIFAR               = f'{URL}cifar10.tgz'\n",
    "    COCO_SAMPLE         = f'{S3_COCO}coco_sample.tgz'\n",
    "    COCO_TINY           = f'{S3_COCO}coco_tiny.tgz'\n",
    "    HUMAN_NUMBERS       = f'{URL}human_numbers.tgz'\n",
    "    IMDB                = f'{S3_NLP}imdb.tgz'\n",
    "    IMDB_SAMPLE         = f'{URL}imdb_sample.tgz'\n",
    "    ML_SAMPLE           = f'{URL}movie_lens_sample.tgz'\n",
    "    ML_100k             = 'https://files.grouplens.org/datasets/movielens/ml-100k.zip'\n",
    "    MNIST_SAMPLE        = f'{URL}mnist_sample.tgz'\n",
    "    MNIST_TINY          = f'{URL}mnist_tiny.tgz'\n",
    "    MNIST_VAR_SIZE_TINY = f'{S3_IMAGE}mnist_var_size_tiny.tgz'\n",
    "    PLANET_SAMPLE       = f'{URL}planet_sample.tgz'\n",
    "    PLANET_TINY         = f'{URL}planet_tiny.tgz'\n",
    "    IMAGENETTE          = f'{S3_IMAGE}imagenette2.tgz'\n",
    "    IMAGENETTE_160      = f'{S3_IMAGE}imagenette2-160.tgz'\n",
    "    IMAGENETTE_320      = f'{S3_IMAGE}imagenette2-320.tgz'\n",
    "    IMAGEWOOF           = f'{S3_IMAGE}imagewoof2.tgz'\n",
    "    IMAGEWOOF_160       = f'{S3_IMAGE}imagewoof2-160.tgz'\n",
    "    IMAGEWOOF_320       = f'{S3_IMAGE}imagewoof2-320.tgz'\n",
    "    IMAGEWANG           = f'{S3_IMAGE}imagewang.tgz'\n",
    "    IMAGEWANG_160       = f'{S3_IMAGE}imagewang-160.tgz'\n",
    "    IMAGEWANG_320       = f'{S3_IMAGE}imagewang-320.tgz'\n",
    "\n",
    "    # kaggle competitions download dogs-vs-cats -p {DOGS.absolute()}\n",
    "    DOGS = f'{URL}dogscats.tgz'\n",
    "\n",
    "    # image classification datasets\n",
    "    CALTECH_101  = f'{S3_IMAGE}caltech_101.tgz'\n",
    "    CARS         = f'{S3_IMAGE}stanford-cars.tgz'\n",
    "    CIFAR_100    = f'{S3_IMAGE}cifar100.tgz'\n",
    "    CUB_200_2011 = f'{S3_IMAGE}CUB_200_2011.tgz'\n",
    "    FLOWERS      = f'{S3_IMAGE}oxford-102-flowers.tgz'\n",
    "    FOOD         = f'{S3_IMAGE}food-101.tgz'\n",
    "    MNIST        = f'{S3_IMAGE}mnist_png.tgz'\n",
    "    PETS         = f'{S3_IMAGE}oxford-iiit-pet.tgz'\n",
    "\n",
    "    # NLP datasets\n",
    "    AG_NEWS                 = f'{S3_NLP}ag_news_csv.tgz'\n",
    "    AMAZON_REVIEWS          = f'{S3_NLP}amazon_review_full_csv.tgz'\n",
    "    AMAZON_REVIEWS_POLARITY = f'{S3_NLP}amazon_review_polarity_csv.tgz'\n",
    "    DBPEDIA                 = f'{S3_NLP}dbpedia_csv.tgz'\n",
    "    MT_ENG_FRA              = f'{S3_NLP}giga-fren.tgz'\n",
    "    SOGOU_NEWS              = f'{S3_NLP}sogou_news_csv.tgz'\n",
    "    WIKITEXT                = f'{S3_NLP}wikitext-103.tgz'\n",
    "    WIKITEXT_TINY           = f'{S3_NLP}wikitext-2.tgz'\n",
    "    YAHOO_ANSWERS           = f'{S3_NLP}yahoo_answers_csv.tgz'\n",
    "    YELP_REVIEWS            = f'{S3_NLP}yelp_review_full_csv.tgz'\n",
    "    YELP_REVIEWS_POLARITY   = f'{S3_NLP}yelp_review_polarity_csv.tgz'\n",
    "\n",
    "    # Image localization datasets\n",
    "    BIWI_HEAD_POSE     = f\"{S3_IMAGELOC}biwi_head_pose.tgz\"\n",
    "    CAMVID             = f'{S3_IMAGELOC}camvid.tgz'\n",
    "    CAMVID_TINY        = f'{URL}camvid_tiny.tgz'\n",
    "    LSUN_BEDROOMS      = f'{S3_IMAGE}bedroom.tgz'\n",
    "    PASCAL_2007        = f'{S3_IMAGELOC}pascal_2007.tgz'\n",
    "    PASCAL_2012        = f'{S3_IMAGELOC}pascal_2012.tgz'\n",
    "\n",
    "    # Audio classification datasets\n",
    "    MACAQUES           = f'{GOOGLE}ml-animal-sounds-datasets/macaques.zip'\n",
    "    ZEBRA_FINCH        = f'{GOOGLE}ml-animal-sounds-datasets/zebra_finch.zip'\n",
    "\n",
    "    # Medical Imaging datasets\n",
    "    #SKIN_LESION        = f'{S3_IMAGELOC}skin_lesion.tgz'\n",
    "    SIIM_SMALL         = f'{S3_IMAGELOC}siim_small.tgz'\n",
    "    TCGA_SMALL         = f'{S3_IMAGELOC}tcga_small.tgz'\n",
    "\n",
    "    #Pretrained models\n",
    "    OPENAI_TRANSFORMER = f'{S3_MODEL}transformer.tgz'\n",
    "    WT103_FWD          = f'{S3_MODEL}wt103-fwd.tgz'\n",
    "    WT103_BWD          = f'{S3_MODEL}wt103-bwd.tgz'\n",
    "\n",
    "    def path(url='.', c_key='archive'):\n",
    "        \"Return local path where to download based on `c_key`\"\n",
    "        fname = url.split('/')[-1]\n",
    "        local_path = URLs.LOCAL_PATH/('models' if c_key=='models' else 'data')/fname\n",
    "        if local_path.exists(): return local_path\n",
    "        return fastai_path(c_key)/fname"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default local path is at `~/.fastai/archive/` but this can be updated by passing a different `c_key`. Note: `c_key` should be one of `'archive'', 'data', 'model', 'storage'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Path('/home/jhoward/.fastai/archive/oxford-iiit-pet.tgz')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url = URLs.PETS\n",
    "local_path = URLs.path(url)\n",
    "test_eq(local_path.parent, fastai_path('archive'))\n",
    "local_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Path('/home/jhoward/.fastai/models/oxford-iiit-pet.tgz')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "local_path = URLs.path(url, c_key='model')\n",
    "test_eq(local_path.parent, fastai_path('model'))\n",
    "local_path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## untar_data -"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def untar_data(url, archive=None, data=None, c_key='data', force_download=False):#, extract_func=file_extract, timeout=4):\n",
    "    \"Download `url` to `fname` if `dest` doesn't exist, and extract to folder `dest`\"\n",
    "    d = FastDownload(fastai_cfg(), module=fastai.data, archive=archive, data=data, base='~/.fastai')\n",
    "    return d.get(url, force=force_download, extract_key=c_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`untar_data` is a thin wrapper for `FastDownload.get`. It downloads and extracts `url`, by default to subdirectories of `~/.fastai`, and returns the path to the extracted data. Setting the `force_download` flag to 'True' will overwrite any existing copy of the data already present. For an explanation of the `c_key` parameter, see `URLs`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <div>\n",
       "        <style>\n",
       "            /* Turns off some styling */\n",
       "            progress {\n",
       "                /* gets rid of default border in Firefox and Opera. */\n",
       "                border: none;\n",
       "                /* Needs to be in here for Safari polyfill so background images work as expected. */\n",
       "                background-size: auto;\n",
       "            }\n",
       "            .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n",
       "                background: #F44336;\n",
       "            }\n",
       "        </style>\n",
       "      <progress value='3219456' class='' max='3214948' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
       "      100.14% [3219456/3214948 00:02<00:00]\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Path('/home/jhoward/.fastai/data/mnist_sample')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "untar_data(URLs.MNIST_SAMPLE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#hide\n",
    "#Check all URLs are in the download_checks.py file and match for downloaded archives\n",
    "# from fastdownload import read_checks\n",
    "# fd = FastDownload(fastai_cfg(), module=fastai.data)\n",
    "# _whitelist = \"MDL LOCAL_PATH URL WT103_BWD WT103_FWD GOOGLE\".split()\n",
    "# checks = read_checks(fd.module)\n",
    "\n",
    "# for d in dir(URLs): \n",
    "#     if d.upper() == d and not d.startswith(\"S3\") and not d in _whitelist: \n",
    "#         url = getattr(URLs, d)\n",
    "#         assert url in checks,f\"\"\"{d} is not in the check file for all URLs.\n",
    "# To fix this, you need to run the following code in this notebook before making a PR (there is a commented cell for this below):\n",
    "# url = URLs.{d}\n",
    "# fd.get(url, force=True)\n",
    "# fd.update(url)\n",
    "# \"\"\"\n",
    "#         f = fd.download(url)\n",
    "#         assert fd.check(url, f),f\"\"\"The log we have for {d} in checks does not match the actual archive.\n",
    "# To fix this, you need to run the following code in this notebook before making a PR (there is a commented cell for this below):\n",
    "# url = URLs.{d}\n",
    "# _add_check(url, URLs.path(url))\n",
    "# \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export -"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Converted 00_torch_core.ipynb.\n",
      "Converted 01_layers.ipynb.\n",
      "Converted 01a_losses.ipynb.\n",
      "Converted 02_data.load.ipynb.\n",
      "Converted 03_data.core.ipynb.\n",
      "Converted 04_data.external.ipynb.\n",
      "Converted 05_data.transforms.ipynb.\n",
      "Converted 06_data.block.ipynb.\n",
      "Converted 07_vision.core.ipynb.\n",
      "Converted 08_vision.data.ipynb.\n",
      "Converted 09_vision.augment.ipynb.\n",
      "Converted 09b_vision.utils.ipynb.\n",
      "Converted 09c_vision.widgets.ipynb.\n",
      "Converted 10_tutorial.pets.ipynb.\n",
      "Converted 10b_tutorial.albumentations.ipynb.\n",
      "Converted 11_vision.models.xresnet.ipynb.\n",
      "Converted 12_optimizer.ipynb.\n",
      "Converted 13_callback.core.ipynb.\n",
      "Converted 13a_learner.ipynb.\n",
      "Converted 13b_metrics.ipynb.\n",
      "Converted 14_callback.schedule.ipynb.\n",
      "Converted 14a_callback.data.ipynb.\n",
      "Converted 15_callback.hook.ipynb.\n",
      "Converted 15a_vision.models.unet.ipynb.\n",
      "Converted 16_callback.progress.ipynb.\n",
      "Converted 17_callback.tracker.ipynb.\n",
      "Converted 18_callback.fp16.ipynb.\n",
      "Converted 18a_callback.training.ipynb.\n",
      "Converted 18b_callback.preds.ipynb.\n",
      "Converted 19_callback.mixup.ipynb.\n",
      "Converted 20_interpret.ipynb.\n",
      "Converted 20a_distributed.ipynb.\n",
      "Converted 21_vision.learner.ipynb.\n",
      "Converted 22_tutorial.imagenette.ipynb.\n",
      "Converted 23_tutorial.vision.ipynb.\n",
      "Converted 24_tutorial.image_sequence.ipynb.\n",
      "Converted 24_tutorial.siamese.ipynb.\n",
      "Converted 24_vision.gan.ipynb.\n",
      "Converted 30_text.core.ipynb.\n",
      "Converted 31_text.data.ipynb.\n",
      "Converted 32_text.models.awdlstm.ipynb.\n",
      "Converted 33_text.models.core.ipynb.\n",
      "Converted 34_callback.rnn.ipynb.\n",
      "Converted 35_tutorial.wikitext.ipynb.\n",
      "Converted 37_text.learner.ipynb.\n",
      "Converted 38_tutorial.text.ipynb.\n",
      "Converted 39_tutorial.transformers.ipynb.\n",
      "Converted 40_tabular.core.ipynb.\n",
      "Converted 41_tabular.data.ipynb.\n",
      "Converted 42_tabular.model.ipynb.\n",
      "Converted 43_tabular.learner.ipynb.\n",
      "Converted 44_tutorial.tabular.ipynb.\n",
      "Converted 45_collab.ipynb.\n",
      "Converted 46_tutorial.collab.ipynb.\n",
      "Converted 50_tutorial.datablock.ipynb.\n",
      "Converted 60_medical.imaging.ipynb.\n",
      "Converted 61_tutorial.medical_imaging.ipynb.\n",
      "Converted 65_medical.text.ipynb.\n",
      "Converted 70_callback.wandb.ipynb.\n",
      "Converted 71_callback.tensorboard.ipynb.\n",
      "Converted 72_callback.neptune.ipynb.\n",
      "Converted 73_callback.captum.ipynb.\n",
      "Converted 74_callback.azureml.ipynb.\n",
      "Converted 97_test_utils.ipynb.\n",
      "Converted 99_pytorch_doc.ipynb.\n",
      "Converted dev-setup.ipynb.\n",
      "Converted app_examples.ipynb.\n",
      "Converted camvid.ipynb.\n",
      "Converted migrating_catalyst.ipynb.\n",
      "Converted migrating_ignite.ipynb.\n",
      "Converted migrating_lightning.ipynb.\n",
      "Converted migrating_pytorch.ipynb.\n",
      "Converted migrating_pytorch_verbose.ipynb.\n",
      "Converted ulmfit.ipynb.\n",
      "Converted index.ipynb.\n",
      "Converted quick_start.ipynb.\n",
      "Converted tutorial.ipynb.\n"
     ]
    }
   ],
   "source": [
    "#hide\n",
    "from nbdev.export import notebook2script\n",
    "notebook2script()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "jupytext": {
   "split_at_heading": true
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
